Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware

نویسندگان

Ahmed H. El Zein

Alistair P. Rendell

چکیده

The CUDA model for GPUs presents the programmer with a plethora of different programming options. These includes different memory types, different memory access methods, and different data types. Identifying which options to use and when is a non-trivial exercise. This paper explores the effect of these different options on the performance of a routine that evaluates sparse matrix vector products across three different generations of NVIDIA GPU hardware. A process for analysing performance and selecting the subset of implementations that perform best is proposed. The potential for mapping sparse matrix attributes to optimal CUDA sparse matrix vector product implementation is discussed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sparse linear algebra on a GPU

We investigate what the graphics processing units (GPUs) have to offer compared to the central processing units (CPUs) when solving a sparse linear system of equations. This is performed by using a GPU to simulate fluid-flow in a porous medium. Flow-problems are discretized mainly by the mimetic finite element discretization, but also by a two-point fluxapproximation (TPFA) method. Both of thes...

متن کامل

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators

Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming language extensions (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized n...

متن کامل

Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators

Hardware accelerators are becoming ubiquitous high performance scientific computing. They are capable of delivering an unprecedented level of concurrent execution contexts. High-level programming languages (e.g., CUDA), profiling tools (e.g., PAPI-CUDA, CUDA Profiler) are paramount to improve productivity, while effectively exploiting the underlying hardware. We present an optimized numerical k...

متن کامل

The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs

Existing formats for SparseMatrix-Vector Multiplication (SpMV) on the GPU are outperforming their corresponding implementations on multi-core CPUs. In this paper, we present a new format called Sliced COO (SCOO) and an efficient CUDA implementation to perform SpMV on the GPU. While previous work shows experiments on small to medium-sized sparse matrices, we perform evaluations on large sparse m...

متن کامل

A Survey on Performance Modelling and Optimization Techniques for SpMV on GPUs

Sparse Matrix is a matrix consisting of very few non-zero entries. Large sparse matrices are often used in engineering and scientific operations. Especially sparse-matrix vector multiplication is an important operation for solving linear system and partial differential equations. However, there is a possibility that even though the matrix is partitioned and stored appropriately, the performance...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Concurrency and Computation: Practice and Experience

دوره 24 شماره

صفحات -

تاریخ انتشار 2012

Generating optimal CUDA sparse matrix-vector product implementations for evolving GPU hardware

نویسندگان

چکیده

منابع مشابه

Sparse linear algebra on a GPU

Optimizing Memory-Bound SYMV Kernel on GPU Hardware Accelerators

Optimizing Memory-Bound Numerical Kernels on GPU Hardware Accelerators

The Sliced COO Format for Sparse Matrix-Vector Multiplication on CUDA-enabled GPUs

A Survey on Performance Modelling and Optimization Techniques for SpMV on GPUs

عنوان ژورنال:

اشتراک گذاری